Validity and reproducibility of a whole‐room indirect calorimeter for the estimation of VO2 , VCO2 , and resting metabolic rate

Abstract Whole‐room indirect calorimeters (WRICs) provide accurate instruments for the measurement of respiratory exchange, energy expenditure, and macronutrient oxidation. Here, we aimed to determine the validity and reproducibility of a 7500 L WRIC for the measurement of ventilation rates and resting metabolic rate (RMR). Technical validation was performed with propane combustion tests (n = 10) whereas biological reproducibility was tested in healthy subjects (13 women, 6 men, mean ± SD age 39.6 ± 15.3) in two 60 min measurements separated by 24 h. Subjects followed a run‐in protocol prior to measurements. The coefficient of variation (CV) and intraclass correlation coefficient (ICC) were calculated for ventilation rates of O2 (VO2), CO2 (VCO2), the respiratory quotient (RQ; VCO2/VO2), and RMR. Technical validation showed good validity with CVs ranging from 0.67% for VO2 to 1.00% for energy expenditure. For biological reproducibility, CVs were 2.89% for VO2; 2.67% for VCO2; 1.95% for RQ; and 2.68% for RMR. With the exception of RQ (74%), ICCs were excellent for VO2 (94%), VCO2 (96%) and RMR (95%). Excluding participants that deviated from the run‐in protocol did not alter results. In conclusion, the 7500 L WRIC is technically valid and reproducible for ventilation rates and RMR.

WRICs can be used to study energy requirements in humans with different metabolic conditions and to study the effects of different dietary exposures on the various components of energy expenditure (Hall et al., 2019(Hall et al., , 2021Müller et al., 2021).
Resting metabolic rate (RMR) makes up about 70%-85% of TDEE, whereas physical activity accounts for 15% to 30%, and is otherwise influenced by age, gender, hormones, body size, body composition, physical activity, dietary factors, and diseases (Bosy-Westphal et al., 2009;Heymsfield et al., 2018Heymsfield et al., , 2019Heymsfield et al., , 2021Ruggiero & Ferrucci, 2006). Therefore, accurate tools for the measurement of RMR are important in research and the clinic. Easy-to-use tools for the measurement of RMR include ventilated hoods that offer time-and cost-effective alternatives to a 24-h measurement in a WRIC, and predictive equations that are commonly used in clinical practice. Both methods are less accurate and imprecise compared to whole-room indirect calorimetry (Rising et al., 2015). In this regard, smaller WRICs offer an attractive alternative as it has shorter response times compared to WRICs built for 24-h measurements (Chen, Smith, et al., 2020). The short response time enables the detection of metabolic changes in a study subject or patient within 30 to 60 min with a high degree of accuracy (Rising et al., 2015(Rising et al., , 2017. At the Department of Nutrition, University of Oslo, Norway, a small WRIC with an interior volume of 7500 Ls has been built to facilitate measurements of RMR. According to newly established guidelines for using and reporting WRIC facilities, it is recommended that validation and reproducibility experiments on energy expenditure (EE) and metabolic rates are performed and published (Chen, Smith, et al., 2020). The aim of this study was therefore to validate a WRIC intended for measurements of RMR. Validation was performed using propane gas combustion tests (technical validation), and two 60 min measurements separated by 24 h in healthy men and women to ascertain the reproducibility/biological validity of VO 2 , VCO 2 , RMR, and the respiratory quotient (RQ).

| System characteristics, data acquisition, and processing
Three rooms have been built at the Department of Nutrition, University of Oslo, Norway for the measurement of metabolic rates over 24 h, during exercise and at rest. The present study measured and reports RMR (kcal/d) and the respiratory quotient (RQ; L (V) of CO 2 /L of O 2 ) measured using the WRIC specific for RMR. This involved the determination of oxygen (O 2 , %), carbon dioxide (CO 2 , %), and water vapor pressure (WVP, kPa) according to previously published protocols (Rising et al., 2015(Rising et al., , 2017. The RMR-WRIC chamber has an interior volume of 7500 L after accounting for space taken up by a recliner, sink, and toilet. It has one window with a view of outside surroundings and one window facing the control room. The door is equipped with a blood sample port that was not in use for this particular study. Fresh outside air is infused into the buffer spaces surrounding the WRIC by the main building's heating, ventilation, and air-conditioning system. This system ensures a slight overpressure to prevent any outwards leaks from the WRIC and maintains an air temperature of 22°C. Air is directed from the buffer space into the WRIC at floor level. The WRIC is equipped with a heat pump unit (LG D09TR NSJ, LG Electronics) that maintains a temperature of 22°C and set at a low fan speed to mix respiratory gases. Air is continuously pulled from the RMR-WRIC at a constant flow rate of 100 L/min, and O 2 , CO 2 , and WVP concentrations are measured by a Promethion GA3m2/FG-250 (Sable Systems International) integrated metabolic instrumentation. This instrumentation contains dual CO 2 , O 2 , and WVP sensors that alternate between measuring gas concentrations in the incurrent (baseline air from the buffer space) and excurrent air streams, thus maintaining a continuous measurement of the subject's respiratory exchange within the RMR-WRIC.
Raw data were acquired by Caloscreen 1.3.16 (Sable Systems International) and processed by ExpeData 1.9.27 (Sable Systems International). This software contains a macro code for mathematically processing the data including signal preconditioning, WVP dilution correction, and conversion to standard pressure, as well as baseline drift correction from incurrent air measurements, mathematically drying the air stream, background baselining for each of the two analyzers and correcting for room response using z-transformation (Bartholomew et al., 1981). The theoretical basis for the equations is published by Lighton (2018). After data acquisition and processing, RMR is calculated using the Weir equation (Weir, 1949). Mean kcal/min was multiplied with 1440 to convert to kcal/day.

| Calibration and equilibration of the instruments
Zero calibration of the gas analyzers of the GA3m2/ FG-250 was performed by first infusing nitrogen gas for 25 min and adjusting any CO 2 , O 2 , and WVP readings to zero. Secondly, a certified span gas with a CO 2 concentration of 1.04% was infused for 5 min, and in the case of any deviation from the known concentrations of CO 2 , the respective sensors were adjusted to 1.04%. Finally, an automated equilibration procedure was performed to zero the WVP sensors in a process where incurrent air is passed through the desiccant canister. The span for WVP is calculated as the amount of WVP that would be present if the O 2 readings were 20.94%, using the current WVP readings when the instruments are sampling incurrent air. Additional technical details can be found in (Lighton, 2018;Rising et al., 2015Rising et al., , 2017. A column desiccant (Drierite, Fisher Scientific) canister was used to chemically dry the air stream in order to zero the WVP sensors during calibration. Calibration and equilibration were performed prior to each day of testing.

| Quality control and technical validation
Ten one-h propane combustion tests using instrumentgrade propane (99.2% purity) (Airgas Healthcare) were performed in accordance with methods outlined in (Rising et al., 2017). Briefly, propane consumed (g) was measured by a calibrated scale (Sartorius Lab Instruments GmBH & Co), and the weight of the propane was noted at the start and end of the 1-h measurement for the calculation of the propane burn rate (g/min).

| Study participants for reproducibility/biological validation
For the reproducibility/biological validation study, students and staff at the Department of Nutrition, University of Oslo were recruited for two consecutive measurements separated by 24-h between March and June in 2022. In total, 20 participants (6 men and 14 women) were included. One participant was excluded from the analysis because of illness at measurement Day 2, bringing the final number of participants to 19 (6 men and 13 women) ( Table 2). Exclusion criteria were body mass index <18 kg/ m 2 , medications that affect energy metabolism (e.g., Levaxin), smoking, chronic disease with known effects on energy expenditure, pregnancy, lactation, unstable weight the past 3 months (± 5%) and self-reported claustrophobia. Participants gave written informed consents, and a letter of exempt was provided from the Regional Ethical Committee, as the project was considered methodological and not health research (Reference number: 368847). The project was approved by the Data Protection Services (SIKT) (Reference number: 359371).

| Study protocol
An overview of the study protocol can be seen in Figure 1a. Three days (−72 h) prior to each measurement, participants were advised to follow a diet in line with the foodbased dietary guidelines from the Norwegian Directorate of Health (Helsedirektoratet, 2015). Guidelines included eating the following: 1. Three to five meals per day 2. Five portions of fruits, berries and vegetables per day 3. Four portions of whole-grain products per day 4. Three portions of low-fat dairy per day 5. Meat-, fish-, or vegetarian-based dinner 6. Potatoes, rice, or pasta for dinner, in the amounts needed to feel full 7. Use vegetable oils instead of butter for cooking 8. Avoid sugar-sweetened beverages 9. Avoid alcohol F I G U R E 1 (a) Overview of the study design and (b) picture of the RMR-WRIC at the Department of Nutrition, University of Oslo.
One day (−24 h) prior to each measurement, participants were asked to refrain from caffeine intake (coffee, tea, energy drinks, soda drinks) and vigorous exercise such as going to the gym or for a run. The participants were required to fast for 12 h and advised to sleep for 6 to 8 h prior to the measurement. Protocol adherence was checked by (1) a 24-h dietary recall at the first measurement day, and (2) activity and sleep monitoring the final 24 h before measurements using SenseWear Armband Mini (BodyMedia), which has been validated for measuring physical activity and sleep in several populations (Berntsen et al., 2010;Cereda et al., 2007;MacKey et al., 2011;Reeve et al., 2014). Physical activity and sleep monitoring, and resting heart rate measurements were taken prior to each of the two measurements. The subjects were asked to maintain their self-reported diet on the day before the second measurement. Significant protocol deviations that warranted sensitivity analyses were any consumption of caffeinated drinks, > 10 min of vigorous physical activity and nonfasting measurements.
On the measurement days, height and weight measurements using a wireless scale (Seca 285 (Birmingham, UK)), and 24-h recall interviews were performed immediately on arrival on Day 1. Brand names and recipe ingredients were registered. The food items were quantified by household measures and a booklet containing photographs of foods in different portion sizes (Norkost, 2012). Intake of foods were recorded as consumed and coded in the "Food composition database and food and nutrient calculation system (KBS)," at the department of Nutrition, University of Oslo. Participants rested for 30 min, and heart rate was recorded using a Polar wrist watch coupled to a heart rate monitor (Polar A300, Device ID 9F7511A, Firmware 1.2.135, HW model 00753320.07) before entering the WRIC. Sixty-minute measurements started 5 min after closing the chamber door. The measurement was repeated after 24 h to minimize the influence of weight fluctuations and the menstrual cycle (Bisdee et al., 1989;Henry et al., 2003).

| Statistical analysis
Distribution of continuous variables were checked by visual inspection of histograms and QQ-plots. For the technical validation, expected vs. measured VO 2 , VCO 2 , energy expenditure, and RQ were compared using independent t-tests. For reproducibility, the primary outcome variables were VO 2 , VCO 2 , RMR, and RQ. Repeated measures were compared using paired t-tests. Correlational analyses for Day 1 and Day 2 were performed using Pearson correlation. Measures of accuracy and precision included crude relative mean differences between the two measurements and the coefficient of variation (CV, standard deviation [SD]/mean×100) for each subject. The intraclass correlation coefficient (ICC) was derived from linear mixed regression models with the outcome variable as the dependent variable, no independent variable, and a random term for subject ID. Pearson correlation coefficients and ICCs >0.9 were defined as strong or excellent based on terminology used in a previous report (Allerton et al., 2021). Bland-Altman plots were made to visualize limits of agreement (mean difference ± 1.96 SD), and presence of outliers in the data. To explore proportional bias, linear regression models were constructed where the difference between Days 1 and 2 for each outcome was regressed on the mean value of each parameter for each subject. A p-value <0.05 was considered significant. All statistical analyses were conducted in R version 4.2.1. R packages used for data wrangling, plotting and statistical analysis included the tidyverse package compilation, data. table, patchwork, and lmer.
In the biological reliability study, subjects with poor adherence to the recommendations of caffeine intake and being vigorously physical active the day prior to the measurement were kept in the full analyses to judge the effect of varying protocol adherence on the measurements. Sensitivity analyses excluding these subjects were performed in order to assess whether protocol deviance affected the main results.

| Technical validation
For the propane combustion tests (n = 10), the mean (SD) amount of propane burned was 13.79 ± 2.16 g with an average burn rate of 0.2279 ± 0.037 g/min. Details from the tests are given in Table 1. Briefly, the relative differences between measured and expected values ranged from −0.55 (1.11) % for VO 2 to 1.32 (1.07) % for energy expenditure, whereas the CVs ranged from 0.67% for VO 2 to 1.00% for energy expenditure. There was a near-perfect correlation between expected and measured values (r > 0.99).
Of the 19 participants, one reported vigorous physical activity (biking to work) whereas five consumed caffeinated beverages −24 h before measurements. These were kept in the final analyses but excluded in sensitivity analyses reported under Sensitivity analyses and sources of RQ variability.

| Biological reliability
Mean values, relative differences, deltas, CVs, ICCs, and correlation coefficients for each measurement day are shown in Table 3. VO 2 , VCO 2 , and RMR showed excellent reliability as assessed by ICC (94%-96%), with CVs at 2.89% for VO 2 , 2.67% for VCO 2 , and 2.68% for RMR. There were no significant differences between Day 1 and Day 2 for VO 2 , VCO 2 , and RMR. For RQ, ICC was lower at 74% with a CV of 1.95%, and there was a significant difference between Day 1 and Day 2 (mean difference = 0.02, p = 0.004).
The Bland-Altman plots in Figure 2 illustrate the mean difference, 95% limits of agreement and the regression line for proportional bias. For VO 2 , the limits of agreement ranged from −42.1 to 30.1 L/day; for VCO 2 from −25.9 to 24.1 L/ day; for RQ from −0.06 to 0.03; and for RMR from −143 to 189. There was no evidence of proportional bias indicating that there were no significant changes in the individual differences with higher values of VO 2 , VCO 2 , RQ, or RMR (all p > 0.90).

RQ variability
With the exception of a decrease in the ICC for RQ from 74 to 69%, all parameters were generally similar in sensitivity analyses excluding non-compliant subjects (Table 4).
We and others (Allerton et al., 2021;Delsoglio et al., 2020) have observed less reliable measures of RQ following repeated measures using indirect calorimetry, possibly by accumulating random errors in VO 2 and VCO 2 measurements. Therefore, we explored whether the day-to-day variability in VO 2 and VCO 2 could predict the variability in RQ. There was no correlation between VCO 2 residuals and RQ residuals (data not shown). However, there was a significant, inverse association between VO 2 residuals and RQ residuals (Pearson's r = −0.52, p < 0.001).

| DISCUSSION
The aim of this study was to evaluate a 7500 L WRIC in estimating VO 2 , VCO 2 , RMR, and RQ using propane combustion and repeated measurements within subjects, to allow for short-term measurements of RMR which is less labor-intensive than full 24-h measurements. Overall, measured vs. expected values for the propane combustion tests were satisfactory, and VO 2 , VCO 2 , and RMR showed excellent test-retest reliability for the 19 subjects with ICCs ranging from 94 to 96%. For RQ, ICC was modest at 74% with a CV of 1.95%. These estimates were robust to protocol deviations in some participants.
For the technical validation, our results are comparable to a previous study (Rising et al., 2015) and were within ± 1.5% of expected values calculated from propane stoichiometry for all measures. For reliability, we observed slightly higher T A B L E 1 Results from the technical validation (n = 10) a . ICCs and lower CVs for RQ and RMR compared to repeated measurements taken in a 24-h WRIC (Allerton et al., 2021). Although this observation may be due to differences in protocol, sample size, and instrumentation, smaller WRICs have shorter response times and require smaller correction factors for room volume during short-term measurements, possibly increasing precision (Chen, Smith, et al., 2020). Related to this, a study comparing 40-min measurements of RMR in one large (26,000 L) and one small (5500 L) WRIC observed a relative difference of ~4% between the two measurements (Chen, Scott, et al., 2020), possibly due to the difference in response time.
In contrast to VO 2 , VCO 2 , and RMR, repeated measures of RQ showed lower test-retest reliability and weaker correlation coefficients between measurements. The reason for this is uncertain, although some explanations have been offered by other authors. Desoglio et al. discuss that the lower reliability between measurements in RQ is due to the fact that random errors in between-day measurements of VO 2 and VCO 2 accumulate and leads to inaccurate RQ measurements (Delsoglio et al., 2020) especially over short observation periods (Chen, Smith, et al., 2020). In addition, RQ is a ratio with a physiological range of 0.7 to 1.2 and small deviations in VO 2 and VCO 2 can therefore produce substantial noise. Indeed, longer measurements tend to produce higher ICCs for 24-h RQ (Allerton et al., 2021;Dörner et al., 2022) whereas basal, sleeping, and resting RQ appear less reliable (Allerton et al., 2021). We observed a significant inverse correlation between scaled VO 2 and scaled RQ residuals, indicating that as the variability in VO 2 increases, so does the variability in RQ, whereas this was not observed for VCO 2 . As opposed to CO 2 analyzers, the O 2 analyzers are susceptible to any remaining water vapor in the sample gases after WVP dilution correction (Lighton, 2018). Thus, dayto-day variation in chamber humidity can produce small errors in the estimation of VO 2 , which may in turn compound with further calculations and impact the variability of RQ. The extension of this effect of VO 2 variability on RQ is unreliable estimates of macronutrient oxidation. It has been shown that even small errors in VO 2 can produce large errors in estimates of carbohydrate and fat T A B L E 2 Baseline and run-in characteristics of the participants.

Mean ± SD Median (min, max)
Age, y 39. oxidation (Livesey & Elia, 1988), which therefore represents a limitation of the current approach. Although we show excellent reliability for components of energy expenditure, our study has some limitations.
While we attempted to control dietary intake by giving food-based advice, a greater degree of control could have been achieved if meals for the run-in period had been prepared and tailored to the requirement of each subject.   Mean RMR (kcal/day) Difference in RMR (kcal/day, Day 2 − Day 1) Although the subjects were asked to maintain their selfreported diet, an additional 24-h recall interview might have been performed to enforce greater control of the dietary data. Adherence to the dietary advice was acceptable during the run-in, except for five participants who consumed caffeinated drinks and one who performed vigorous physical activity for 19 min the day before measurement in the WRIC facility. However, sensitivity analyses revealed no influence of protocol deviations on the measured parameters. An additional measurement day and measurements under different conditions (e.g., thermic effect of food) would also have been desirable to increase accuracy, precision and usability, and to be in line with previous validation studies (Allerton et al., 2021;Dörner et al., 2022;Schoffelen & Westerterp, 2008), but was not possible due to resource constraints.

| CONCLUSION
In conclusion, we have demonstrated that a 7500 L WRIC measures VO 2 , VCO 2 , and RMR with excellent reliability, whereas RQ was measured with acceptable reliability. These findings are of crucial importance for future research and clinical applications with RMR as the outcome.

AUTHOR CONTRIBUTIONS
TO conceived the idea for the study; HBH, CH, RD, FH, SMU, and TO designed the study; HBH, CH, ARSS, DTA, EMSA, and TO collected data; RR customized the macro files used to estimate metabolic rates; ABW and RB made intellectual contributions; HBH and TO drafted the paper; All authors read, revised, and approved the final version of the paper.
T A B L E 4 Accuracy and reliability of repeated measurements in sensitivity analyses excluding non-compliant subjects (n = 14).